This is from the sixth chapter of learn.r-journalism.com.
By now you’ve inadvertantly witnessed the different content that R Markdown can help generate:
R Markdown is R code + Markdown.
A very easy way to add formating to plain text created by John Gruber (of Daring Fireball) as a simple way for non-programming types to write in an easy-to-read format that could be converted directly into HTML.
In the image below, the text on the left was interpreted as HTML on the right.
Github has a nice guide on Markdown.
It’s a simple way to embed chunks of R code (or other languages like Python) in Markdown documents.
On one level, it allows you to present your analysis process and results in a format that doesn’t require R to run.
But the code is still there should a researcher or fellow journalist be compelled to reproduce or add to your work.
It’s a version of literate programming. By combining your R code with documentation, it makes your programming more robust, portable, and easier to maintain.
At the click of a button, or the type of a command, you can rerun the code in an R Markdown file to reproduce your work and export the results as a finished report.
R Markdown works thanks to the knitr package, which runs code embedded in Markdown, and Pandoc, which then converts Markdown into a bunch of different output formats, like Word, PDF, HTML, etc.
R Markdown supports dozens of static and dynamic output formats including
Exporting your work into PDFs can be effective.
For this to work, be sure to get LaTex installed first.
Notebooks are very popular.
This is how pandas and ipython notebooks render when uploaded to Github.
It renders well and matches the experience of someone coding in that environment.
How does an R Markdown file look on Github?
Not well.
Each .Rmd file has its own custom YAML section at the top. These are keywords that when combined with the right packages, lets knitr know how to output the .Rmd file. Like toc: toc_float creates an HTML file with a self-generated Table of Contents based on the header titles.
Github doesn’t have a way to interpret that so it creates that nested image above and doesn’t even try to make a table of contents.
That’s fine, though
R Markdown will let you output as HTML, which you can still host on Github Pages (which we’ll go over later). You may have to include links to the actual page but doing so can be more effective than having a .Rmd file “render” in a Github repo.
Rendering in HTML lets you add CSS and Javascript, allowing for the inclusion of content like the interactive table below.
library(DT)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
payroll <- read_csv("../data/bostonpayroll2013.csv") %>%
select(NAME, TITLE, DEPARTMENT, REGULAR, OVERTIME) %>%
filter(row_number()<100)
## Parsed with column specification:
## cols(
## NAME = col_character(),
## TITLE = col_character(),
## DEPARTMENT = col_character(),
## REGULAR = col_character(),
## RETRO = col_character(),
## OTHER = col_character(),
## OVERTIME = col_character(),
## INJURED = col_character(),
## DETAIL = col_character(),
## QUINN = col_character(),
## `TOTAL EARNINGS` = col_character(),
## Community = col_character(),
## ZIPCode = col_integer(),
## State = col_character(),
## X15 = col_character(),
## X16 = col_character()
## )
datatable(payroll, extensions = 'Buttons', options = list(
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
)
)
This table was rendered with the DT package.
And we don’t have to create the code to display it.
It just worked because the folks who ported the DataTables jquery plugin over to R wanted to make it seamless.
Pass certain arguments to the function datatable() and you can include buttons that allow reporters to download your tables as CSVs.
Can you imagine the power of that?
You’ve transformed the data collected it and then instead of sending them a huge spreadsheet, you send them a link to your report in which they can filter things out and download the table that they’ve come up with themselves.
You can render other interactives, as well, like leaflet maps.
This is a very important development as a journalist.
You see, reporters sometimes aren’t very organized.
How often have you had someone email you asking if you could resend your summarized spreadsheet to them again?
This is what their downloads or desktop folder might look like. It’s easy to see why they might have lost track of it.